The Everlasting Database: Statistical Validity at a Fair Price

نویسندگان

  • Blake Woodworth
  • Vitaly Feldman
  • Saharon Rosset
  • Nathan Srebro
چکیده

The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for M non-adaptive queries is O(logM), while the cost to a potentially adaptive user who makes M queries that do not depend on any others is O( √

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of quantitative characteristics considering CPI microdata in Iran

The aim of this study is to estimate the known statistical characteristics of nominal price stickiness in the Iranian economy during the years 1390 to 1399 and at different commodity levels of microdata of consumer price index (including product category, Coicop commodity group, and the whole economy) and thus the stickiness between categories and product groups are also compared. For this purp...

متن کامل

Analytical Review of Fair Distribution of Recreational and Sport Services in by Using Topsis Model

Background. Fair distribution of sports facilities is very effective in the tendency of citizens to exercise. Therefore, the distribution of sports and recreational facilities in cities should be carefully and scientifically explored. Objectives. The purpose of this study was analytical review of fair distribution of recreational and sport services in the city of Mashhad by using Topsis model....

متن کامل

Relating Fairness and Timing in Process Algebras

This paper contrasts two important features of parallel system computations: fairness and timing. The study is carried out at specification system level by resorting to a well-known process description language. The language is extended with labels which allow to filter out those process executions that are not (weakly) fair (as in [5,6]), and with upper time bounds for the process activities (...

متن کامل

Analyzing the Factors Affecting on Price Premium to Ecotourism (Case Study: Isfahan Mesr Desert)

Desert tourism is part of the tourism industry; trip and hike in desert and wasteland areas created a specific type of tourism that is called Desert tourism. Given that recognition, the factors that effect on Willingness to pay Price Premium to ecotourism can lead tourism destination to success. therefore, the intention of this research is to identify and ranking factors that affect paying Pric...

متن کامل

Stock Market Fraud Detection, A Probabilistic Approach

In order to have a fair market condition, it is crucial that regulators continuously monitor the stock market for possible fraud and market manipulation. There are many types of fraudulent activities defined in this context. In our paper we will be focusing on "front running". According to Association of Certified Fraud Examiners, front running is a form of insider information and thus is very ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018